Skip to content

Conversation

@am11
Copy link
Member

@am11 am11 commented Jan 8, 2026

Fixes #116375.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 8, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@am11 am11 force-pushed the feature/il_throw_PUSH_COOP_PINVOKE_FRAME branch from 1a243a6 to 00c4f6a Compare January 8, 2026 22:39
@am11 am11 force-pushed the feature/il_throw_PUSH_COOP_PINVOKE_FRAME branch from b8782d2 to 0d677dd Compare January 8, 2026 22:41
@am11 am11 force-pushed the feature/il_throw_PUSH_COOP_PINVOKE_FRAME branch from 0d677dd to 5359b4c Compare January 8, 2026 22:42
// the xmm registers are not supported by the libunwind
.endm

// Unaligned version for use when stack alignment cannot be guaranteed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from ensuring that the stack is sufficiently aligned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraint is GetOffsetOfFloatArgumentRegisters() returns a fixed -128. TransitionBlock must point to ArgumentRegisters at rsp+136. So floats must be at rsp+136-128 = rsp+8. After call + 12 pushes + alloc_stack 136, rsp is 16-byte aligned, but rsp+8 is NOT 16-byte aligned. Without changing the TransitionBlock layout (which would require changes across many files), we can't achieve 16-byte alignment for float saves. Using unaligned stores for this exception throw path is acceptable since it's not performance critical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand - the PROLOG_WITH_TRANSITION_BLOCK pushes the same set of registers and it has the floats at aligned locations. Actually, why cannot we use the PROLOG_WITH_TRANSITION_BLOCK to do the work here? It seems we are pushing exactly the same set of registers that the macro does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli, I tried PROLOG_WITH_TRANSITION_BLOCK and reverted it as it was failing linux-arm64, win-arm64 and linux-arm which were passing earlier. PROLOG_WITH_TRANSITION_BLOCK doesn't save FP callee-saved registers (d8-d15 on ARM64, d8-d15 on ARM32). Exception dispatch needs these values to correctly restore context during unwinding. We could add them after PROLOG_WITH_TRANSITION_BLOCK, but that adds complexity. The custom macro keeps everything self-contained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand where is the added complexity. Adding the callee saved FP registers seems additive to PROLOG_WITH_TRANSITION_BLOCK. Also, as for the alignment,
this is the stack layout on Windows, you can see we have a padding there to align the floats.

; r9
; r8
; rdx
; rcx                       <- __PWTB_ArgumentRegisters
; return address
; CalleeSavedRegisters::r15
; CalleeSavedRegisters::r14
; CalleeSavedRegisters::r13
; CalleeSavedRegisters::r12
; CalleeSavedRegisters::rbp
; CalleeSavedRegisters::rbx
; CalleeSavedRegisters::rsi
; CalleeSavedRegisters::rdi <- __PWTB_StackAlloc
; padding to align xmm save area
; xmm3
; xmm2
; xmm1
; xmm0                      <- __PWTB_FloatArgumentRegisters
; extra locals + padding to qword align

Similar on Unix x64:

// return address
// CalleeSavedRegisters::rbp
// CalleeSavedRegisters::rbx
// CalleeSavedRegisters::r15
// CalleeSavedRegisters::r14
// CalleeSavedRegisters::r13
// CalleeSavedRegisters::r12
// ArgumentRegisters::r9
// ArgumentRegisters::r8
// ArgumentRegisters::rcx
// ArgumentRegisters::rdx
// ArgumentRegisters::rsi
// ArgumentRegisters::rdi    <- __PWTB_StackAlloc, __PWTB_TransitionBlock
// padding to align xmm save area
// xmm7
// xmm6
// xmm5
// xmm4
// xmm3
// xmm2
// xmm1
// xmm0                      <- __PWTB_FloatArgumentRegisters
// extra locals + padding to qword align

I've looked into it and the TransitionBlock::GetOffsetOfFloatArgumentRegisters is actually never called on x64 windows and the value it returns would be incorrect there. On x64 Unix, the function is called, but it seems that the result is not used for anything (verifying now).
So that would explain why the incorrect offset doesn't hurt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was right, I've changed the GetNegSpace as follows and all coreclr tests on Linux x64 still pass:

    static int GetNegSpaceSize()
    {
        LIMITED_METHOD_CONTRACT;
        int negSpaceSize = 0;
#ifdef CALLDESCR_FPARGREGS
        negSpaceSize += sizeof(FloatArgumentRegisters);
#endif
#if defined(TARGET_ARM) || defined(TARGET_AMD64)
        negSpaceSize += TARGET_POINTER_SIZE; // padding to make FloatArgumentRegisters address properly aligned
#endif
        return negSpaceSize;
    }


// Set the last thrown object before dispatching the exception.
// This is required for exception handling code that checks LastThrownObject.
pThread->SafeSetLastThrownObject(oref);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, I don't understand why this change would require adding this. The last thrown object is set in the EH code when the first pass starts. It is part of the Thread::SafeSetThrowables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am experimenting with it for win-x64 SEH thing. That's the only platform failing tests at the moment as something is corrupting MXCSR. I am now debugging it cdb at the moment, but since I'm not a windows expert, I'd need some help.

This change was just a "maybe that would fix win-x64", not something needed by any platform (including win-x64). Reverted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets see if 22571b2 fixes it. I spent the whole evening debugging with cdbx64.exe to get codegen tests pass. 😅

// the xmm registers are not supported by the libunwind
.endm

// Unaligned version for use when stack alignment cannot be guaranteed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand - the PROLOG_WITH_TRANSITION_BLOCK pushes the same set of registers and it has the floats at aligned locations. Actually, why cannot we use the PROLOG_WITH_TRANSITION_BLOCK to do the work here? It seems we are pushing exactly the same set of registers that the macro does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bring IL_{Throw,Rethrow,ThrowExact} to PUSH_COOP_PINVOKE_FRAME plan

3 participants